Chapter 1: Hello Data

Overview

  • 1.1 Case study: Using stents to prevent strokes
  • 1.2 Data basics
    • 1.2.1 Observations, variables, and data matrices
    • 1.2.2 Types of variables
    • 1.2.3 Relationships between variables
    • 1.2.4 Explanatory and response variables
    • 1.2.5 Observational studies and experiments

1.1 Case study: Using stents to prevent strokes

Does the use of stents reduce the risk of stroke?

  • Treatment group. Patients in the treatment group received a stent and medical management. The medical management included medications, management of risk factors, and help in lifestyle modification.
  • Control group. Patients in the control group received the same medical management as the treatment group, but they did not receive stents.

Researchers randomly assigned 224 patients to the treatment group and 227 to the control group.

Group activity

group 30 days_stroke 30 days_no event 365 days_stroke 365 days_no event
Control 13 214 28 199
Treatment 33 191 45 179
  1. Of the 224 patients in the treatment group, 45 had a stroke by the end of the first year. Using these two numbers, compute the proportion of patients in the treatment group who had a stroke by the end of their first year.

  2. Compute the proportion of patients in the control group who had a stroke by the end of the first year.

  3. What do these two summary statistics tell us about the research question “Does the use of stents reduce the risk of stroke?”

Statistical investigations

Steps in a statistical investigation:

  • Formulate a research question.
  • Collect data.
  • Summarize the data using summary statistics.
  • Interpret the results in the context of the research question.

Statistics is the study of how best to collect, analyze, and draw conclusions from data.

1.2 Data basics

1.2.1 Observations, variables, and data frames

  • A data frame is a table (i.e., a spreadsheet).
    • Each row corresponds to an observational unit: the individual entities under consideration.
    • Each column corresponds to a variable: properties that can be observed on each observational unit.
loan_amount interest_rate term grade state total_income homeownership
22000 10.90 60 B NJ 59000 rent
6000 9.92 36 B CA 60000 rent
25000 26.30 36 E SC 75000 mortgage
6000 9.92 36 B CA 75000 rent
25000 9.43 60 B OH 254000 mortgage
6400 9.92 36 B IN 67000 mortgage

county Data Frame

  • Observational units: United States counties
  • Variables: name, state, pop2000, pop2010, pop2017, pop_change, poverty, homeownership, multi_unit, unemployment_rate, metro, median_edu, per_capita_income, median_hh_income, smoking_ban
name state pop2000 pop2010 pop2017 pop_change poverty homeownership multi_unit unemployment_rate metro median_edu per_capita_income median_hh_income smoking_ban
Autauga County Alabama 43671 54571 55504 1.48 13.7 77.5 7.2 3.86 yes some_college 27841.70 55317 none
Baldwin County Alabama 140415 182265 212628 9.19 11.8 76.7 22.6 3.99 yes some_college 27779.85 52562 none
Barbour County Alabama 29038 27457 25270 -6.22 27.2 68.0 11.1 5.90 no hs_diploma 17891.73 33368 partial
Bibb County Alabama 20826 22915 22668 0.73 15.2 82.9 6.6 4.39 yes hs_diploma 20572.05 43404 none
Blount County Alabama 51024 57322 58013 0.68 15.6 82.0 3.7 4.02 yes hs_diploma 21367.39 47412 none
Bullock County Alabama 11714 10914 10309 -2.28 28.5 76.9 9.9 4.93 no hs_diploma 15444.16 29655 none
Butler County Alabama 21399 20947 19825 -2.69 24.4 69.0 13.7 5.49 no hs_diploma 17014.95 36326 NA
Calhoun County Alabama 112249 118572 114728 -1.51 18.6 70.7 14.3 4.93 yes some_college 23609.64 43686 NA
Chambers County Alabama 36583 34215 33713 -1.20 18.8 71.4 8.7 4.08 no hs_diploma 21079.51 37342 none
Cherokee County Alabama 23988 25989 25857 -0.60 16.1 77.5 4.3 4.05 no hs_diploma 23067.93 40041 none
Chilton County Alabama 39593 43643 44067 0.97 19.4 75.1 4.4 4.05 yes hs_diploma 22793.82 43501 none
Choctaw County Alabama 15922 13859 12945 -3.37 22.3 85.6 3.9 6.39 no hs_diploma 20363.87 32122 none
Clarke County Alabama 27867 25833 24083 -4.12 25.3 80.0 6.3 8.48 no hs_diploma 20099.21 33827 none
Clay County Alabama 14254 13932 13367 -0.29 19.1 72.8 11.2 4.37 no hs_diploma 20879.67 37287 none
Cleburne County Alabama 14123 14972 14900 -0.51 19.1 74.9 5.3 4.46 no hs_diploma 20158.98 37396 none
Coffee County Alabama 43615 49948 51874 2.42 16.1 69.7 13.6 4.39 no some_college 25627.63 49821 none
Colbert County Alabama 54984 54428 54500 -0.05 16.8 73.5 12.3 5.21 yes some_college 22915.45 45477 none
Conecuh County Alabama 14089 13228 12469 -3.40 26.4 81.6 6.0 6.14 no hs_diploma 14814.23 30434 none
Coosa County Alabama 12202 11539 10754 -4.43 14.4 83.7 1.9 4.62 no hs_diploma 19147.01 34792 none
Covington County Alabama 37631 37765 37092 -1.90 17.6 74.0 6.1 5.24 no hs_diploma 22146.06 39467 none
Crenshaw County Alabama 13665 13906 13871 0.17 17.6 67.8 9.2 4.47 no hs_diploma 20466.80 38937 none
Cullman County Alabama 77483 80406 82755 2.58 16.4 74.7 8.5 3.71 no hs_diploma 21287.64 40997 none
Dale County Alabama 49129 50251 49226 -1.07 19.6 61.2 13.2 4.43 no some_college 22299.61 44711 none
Dallas County Alabama 46365 43820 39215 -6.59 31.9 62.6 16.0 7.53 no hs_diploma 18042.21 30065 none
DeKalb County Alabama 64452 71109 71617 1.00 21.5 77.5 6.4 4.49 no hs_diploma 19007.97 38842 none
Elmore County Alabama 65874 79303 81677 1.39 13.5 77.6 7.0 3.64 yes some_college 27266.03 54981 partial
Escambia County Alabama 38440 38319 37447 -0.86 23.8 73.5 7.8 5.03 no hs_diploma 18527.38 35026 none
Etowah County Alabama 103459 104430 102755 -1.07 17.9 73.0 11.9 4.57 yes some_college 21208.43 42064 none
Fayette County Alabama 18495 17241 16468 -2.11 18.1 76.0 7.9 4.73 no hs_diploma 21654.89 36541 none
Franklin County Alabama 31223 31704 31495 -0.03 23.0 69.2 10.4 4.23 no hs_diploma 18783.03 39501 partial
Geneva County Alabama 25764 26790 26421 -0.99 22.5 71.6 6.6 4.27 yes hs_diploma 20741.87 39293 none
Greene County Alabama 9974 9045 8330 -4.82 38.8 71.1 11.1 7.66 no hs_diploma 12330.36 20954 NA
Hale County Alabama 17185 15760 14812 -2.86 26.1 74.3 6.1 5.89 yes hs_diploma 19196.55 34679 none
Henry County Alabama 16310 17302 17147 0.10 13.7 81.9 3.2 4.87 yes hs_diploma 23562.00 45569 none
Houston County Alabama 88787 101547 104346 0.70 18.5 67.4 15.2 4.34 yes some_college 23573.18 42803 none
Jackson County Alabama 53926 53227 51909 -1.92 19.0 76.6 5.8 4.77 no hs_diploma 20061.42 39281 NA
Jefferson County Alabama 662047 658466 659197 0.07 17.6 66.8 24.0 4.24 yes some_college 29259.76 49321 none
Lamar County Alabama 15904 14564 13946 -1.86 23.1 75.1 9.0 4.23 no hs_diploma 21848.70 36016 none
Lauderdale County Alabama 87966 92709 92538 -0.16 16.3 73.0 14.7 4.61 yes some_college 25827.78 44888 none
Lawrence County Alabama 34803 34339 33049 -1.57 16.6 78.7 5.1 4.88 yes hs_diploma 21896.84 43779 none
Lee County Alabama 115092 140247 161604 6.71 22.0 64.2 23.3 3.91 yes some_college 26160.00 47564 none
Limestone County Alabama 65676 82782 94402 6.19 14.8 77.1 9.4 3.98 yes some_college 25685.40 52831 none
Lowndes County Alabama 13473 11299 10076 -5.46 30.2 75.4 7.0 8.00 yes hs_diploma 18901.26 29785 none
Macon County Alabama 24105 21452 18755 -6.32 25.9 68.0 15.5 5.78 no hs_diploma 21921.66 32308 none
Madison County Alabama 276700 334811 361046 4.24 13.6 70.4 21.4 3.85 yes some_college 34416.12 61318 NA
Marengo County Alabama 22539 21027 19375 -3.63 25.6 73.5 8.3 5.65 no hs_diploma 20738.37 32255 none
Marion County Alabama 31214 30776 29833 -1.28 17.6 75.8 10.9 4.79 no hs_diploma 21682.70 35719 none
Marshall County Alabama 82231 93019 95548 1.33 21.0 72.5 9.3 3.80 no hs_diploma 21785.98 41104 partial
Mobile County Alabama 399843 412992 413955 0.01 19.3 68.4 17.7 5.18 yes some_college 23639.54 45802 none
Monroe County Alabama 24324 23068 21327 -3.83 33.7 73.8 6.0 6.74 no hs_diploma 15964.50 26036 none
Montgomery County Alabama 223510 229363 226646 -0.56 20.8 63.2 22.4 4.30 yes some_college 26622.04 46545 none
Morgan County Alabama 111064 119490 118818 -0.60 16.6 73.1 13.7 4.11 yes some_college 24454.90 47529 partial
Perry County Alabama 11861 10591 9339 -6.79 41.9 67.8 11.4 7.94 no hs_diploma 11800.68 22973 none
Pickens County Alabama 20949 19746 20176 4.53 22.9 74.1 10.1 5.29 yes hs_diploma 19887.90 36220 none
Pike County Alabama 29605 32899 33267 -0.88 26.3 56.3 18.7 4.88 no hs_diploma 20558.82 35684 none
Randolph County Alabama 22380 22913 22670 0.55 18.7 75.9 4.8 4.40 no hs_diploma 20401.61 39485 none
Russell County Alabama 49756 52947 57045 -3.51 20.9 62.3 17.8 4.30 yes some_college 20603.92 38988 none
St. Clair County Alabama 64742 83593 88199 2.52 13.7 82.2 5.5 3.97 yes hs_diploma 24731.80 53483 none
Shelby County Alabama 143293 195085 213605 4.76 8.3 80.6 11.4 3.23 yes some_college 34709.33 74063 NA
Sumter County Alabama 14798 13763 12687 -5.15 36.0 68.3 14.5 6.54 no hs_diploma 14308.71 21663 partial
Talladega County Alabama 80321 82291 80065 -1.76 19.3 73.0 9.6 4.96 no hs_diploma 21548.16 39219 none
Tallapoosa County Alabama 41475 41616 40681 -0.83 21.2 73.3 8.9 4.16 no hs_diploma 22228.57 42181 none
Tuscaloosa County Alabama 164875 194656 207811 3.52 17.3 63.3 25.4 4.14 yes some_college 24579.41 50513 none
Walker County Alabama 70713 67023 64058 -2.57 21.5 77.7 6.6 5.02 yes hs_diploma 20704.31 38872 none
Washington County Alabama 18097 17581 16531 -2.11 22.0 83.0 2.6 6.84 no hs_diploma 19907.22 42185 none
Wilcox County Alabama 13183 11670 10719 -3.87 31.9 76.8 6.0 11.40 no hs_diploma 14415.88 27012 none
Winston County Alabama 24843 24484 23722 -1.91 18.7 73.8 6.1 5.00 no hs_diploma 19518.80 35362 none
Aleutians East Borough Alaska 2697 3141 3370 1.38 16.7 59.2 11.8 2.52 no hs_diploma 30891.70 66607 NA
Aleutians West Census Area Alaska 5465 5561 5763 0.19 7.5 36.3 30.9 3.63 no hs_diploma 28443.14 85192 none
Anchorage Municipality Alaska 260283 291826 294356 -2.23 8.1 61.7 35.3 5.97 yes some_college 38324.82 82271 none
Bethel Census Area Alaska 16006 17013 18076 1.28 27.0 61.3 13.9 13.79 no hs_diploma 17802.36 53853 none
Bristol Bay Borough Alaska 1258 997 867 -8.93 7.1 56.6 13.4 6.34 no some_college 38984.10 79500 none
Denali Borough Alaska 1893 1826 2074 7.35 15.5 60.7 14.1 9.38 no some_college 39316.88 83295 none
Dillingham Census Area Alaska 4922 4847 4932 -0.98 16.6 60.7 14.1 9.29 no hs_diploma 21804.82 58708 none
Fairbanks North Star Borough Alaska 82840 97581 99703 -1.18 7.7 59.8 26.2 6.29 yes some_college 34968.82 76250 none
Haines Borough Alaska 2392 2508 2526 -1.29 8.4 74.5 13.2 9.09 no some_college 37141.41 70640 none
Hoonah Angoon Census Area Alaska 3436 2150 NA NA 11.1 64.0 8.6 12.63 no some_college 29040.32 57900 NA
Juneau City and Borough Alaska 30711 31275 32094 -1.46 7.4 64.0 32.2 4.75 no some_college 41254.44 90749 none
Kenai Peninsula Borough Alaska 49691 55400 58617 2.85 11.0 72.7 12.0 8.49 no some_college 32050.39 65279 none
Ketchikan Gateway Borough Alaska 14070 13477 13856 1.27 10.6 59.1 36.7 6.31 no some_college 32635.69 67321 none
Kodiak Island Borough Alaska 13913 13592 13448 -4.48 9.3 59.2 25.9 5.10 no some_college 29922.97 74167 none
Lake and Peninsula Borough Alaska 1823 1631 1620 -2.59 16.5 75.0 2.5 11.89 no hs_diploma 20957.43 45208 none
Matanuska-Susitna Borough Alaska 59322 88995 106532 11.14 9.8 79.2 10.1 8.66 yes some_college 27306.48 74887 none
Nome Census Area Alaska 9196 9492 9921 0.78 24.9 56.2 17.4 12.57 no hs_diploma 21509.23 53821 none
North Slope Borough Alaska 7385 9430 9782 0.06 10.2 48.3 24.9 7.44 no hs_diploma 28886.59 77266 none
Northwest Arctic Borough Alaska 7208 7523 7684 -0.48 25.3 53.7 19.4 16.82 no hs_diploma 19824.44 61533 none
Petersburg Borough Alaska 6684 3815 3281 -0.18 7.8 76.7 9.5 9.34 no some_college 33142.29 63490 NA
Prince of Wales-Hyder Census Area Alaska 6146 5559 6443 -0.05 16.0 69.0 9.7 11.42 no some_college 23990.18 52114 NA
Sitka City and Borough Alaska 8835 8881 8689 -3.05 9.2 55.9 24.4 4.68 no some_college 35002.92 70765 none
Skagway Alaska NA 968 NA NA 5.6 59.1 27.2 10.56 no some_college 38150.23 70673 NA
Southeast Fairbanks Census Area Alaska 6174 7029 6888 -1.32 13.9 65.2 17.3 10.44 no some_college 24596.05 63866 none
Valdez-Cordova Census Area Alaska 10195 9636 9278 -5.06 7.4 71.8 17.6 7.96 no some_college 30746.27 86019 none
Kusilvak Census Area Alaska 7028 7459 8202 2.91 NA 64.8 4.1 NA NA NA NA NA none
Wrangell Alaska NA 2369 NA NA 11.7 78.7 11.9 7.71 no some_college 30153.57 56094 NA
Yakutat City and Borough Alaska 808 662 605 -7.07 6.2 61.1 12.4 9.36 no some_college 29311.67 64583 NA
Yukon-Koyukuk Census Area Alaska 6551 5588 5365 -3.52 25.5 69.1 2.9 18.26 no hs_diploma 19447.74 37819 none
Apache County Arizona 69423 71518 71606 -0.95 35.9 76.3 5.2 10.42 no hs_diploma 12119.78 32360 NA
Cochise County Arizona 117755 131346 124756 -3.54 18.1 69.0 12.2 5.61 yes some_college 26016.10 47847 NA
Coconino County Arizona 116320 134421 140776 3.17 21.0 61.2 18.9 5.55 yes some_college 27266.04 53523 none
Gila County Arizona 51335 53597 53501 0.98 21.9 78.3 4.8 6.06 no some_college 22383.06 41179 NA

variable description
1 name Name of county.
2 state Name of state.
3 pop2000 Population in 2000.
4 pop2010 Population in 2010.
5 pop2017 Population in 2017.
6 pop_change Population change from 2010 to 2017 (in percent).
7 poverty Percent of population in poverty in 2017.
8 homeownership Homeownership rate, 2006-2010.
9 multi_unit Multi-unit rate: percent of housing units that are in multi-unit structures, 2006-2010.
10 unemployment_rate Unemployment rate in 2017.
11 metro Whether the county contains a metropolitan area, taking one of the values yes or no.
12 median_edu Median education level (2013-2017), taking one of the values below_hs, hs_diploma, some_college, or bachelors.
13 per_capita_income Per capita (per person) income (2013-2017).
14 median_hh_income Median household income.
15 smoking_ban Describes the type of county-level smoking ban in place in 2010, taking one of the values none, partial, or comprehensive.

1.2.2 Types of variables

Group Activity

  1. Data were collected about students in a statistics course. Three variables were recorded for each student: number of siblings, student height, and whether the student had previously taken a statistics course. Classify each of the variables as continuous numerical, discrete numerical, or categorical.

  2. An experiment is evaluating the effectiveness of a new drug in treating migraines. A group variable is used to indicate the experiment group for each patient: treatment or control. The num_migraines variable represents the number of migraines the patient experienced during a 3-month period. Classify each variable as either numerical or categorical?

1.2.3 Relationships between variables

Homeownership rate and multi-unit structures

  • homeownership: the percentage of homes that are owned by residents
  • multi-unit: the percentage of housing units that are in multi-unit structures (e.g., apartments, condos)

Are these two variables related?

name state homeownership multi_unit
Autauga County Alabama 77.5 7.2
Baldwin County Alabama 76.7 22.6
Barbour County Alabama 68.0 11.1
Bibb County Alabama 82.9 6.6
Blount County Alabama 82.0 3.7
Bullock County Alabama 76.9 9.9
Butler County Alabama 69.0 13.7
Calhoun County Alabama 70.7 14.3
Chambers County Alabama 71.4 8.7
Cherokee County Alabama 77.5 4.3
Chilton County Alabama 75.1 4.4
Choctaw County Alabama 85.6 3.9
Clarke County Alabama 80.0 6.3
Clay County Alabama 72.8 11.2
Cleburne County Alabama 74.9 5.3
Coffee County Alabama 69.7 13.6
Colbert County Alabama 73.5 12.3
Conecuh County Alabama 81.6 6.0
Coosa County Alabama 83.7 1.9
Covington County Alabama 74.0 6.1
Crenshaw County Alabama 67.8 9.2
Cullman County Alabama 74.7 8.5
Dale County Alabama 61.2 13.2
Dallas County Alabama 62.6 16.0
DeKalb County Alabama 77.5 6.4
Elmore County Alabama 77.6 7.0
Escambia County Alabama 73.5 7.8
Etowah County Alabama 73.0 11.9
Fayette County Alabama 76.0 7.9
Franklin County Alabama 69.2 10.4
Geneva County Alabama 71.6 6.6
Greene County Alabama 71.1 11.1
Hale County Alabama 74.3 6.1
Henry County Alabama 81.9 3.2
Houston County Alabama 67.4 15.2
Jackson County Alabama 76.6 5.8
Jefferson County Alabama 66.8 24.0
Lamar County Alabama 75.1 9.0
Lauderdale County Alabama 73.0 14.7
Lawrence County Alabama 78.7 5.1
Lee County Alabama 64.2 23.3
Limestone County Alabama 77.1 9.4
Lowndes County Alabama 75.4 7.0
Macon County Alabama 68.0 15.5
Madison County Alabama 70.4 21.4
Marengo County Alabama 73.5 8.3
Marion County Alabama 75.8 10.9
Marshall County Alabama 72.5 9.3
Mobile County Alabama 68.4 17.7
Monroe County Alabama 73.8 6.0
Montgomery County Alabama 63.2 22.4
Morgan County Alabama 73.1 13.7
Perry County Alabama 67.8 11.4
Pickens County Alabama 74.1 10.1
Pike County Alabama 56.3 18.7
Randolph County Alabama 75.9 4.8
Russell County Alabama 62.3 17.8
St. Clair County Alabama 82.2 5.5
Shelby County Alabama 80.6 11.4
Sumter County Alabama 68.3 14.5
Talladega County Alabama 73.0 9.6
Tallapoosa County Alabama 73.3 8.9
Tuscaloosa County Alabama 63.3 25.4
Walker County Alabama 77.7 6.6
Washington County Alabama 83.0 2.6
Wilcox County Alabama 76.8 6.0
Winston County Alabama 73.8 6.1
Aleutians East Borough Alaska 59.2 11.8
Aleutians West Census Area Alaska 36.3 30.9
Anchorage Municipality Alaska 61.7 35.3
Bethel Census Area Alaska 61.3 13.9
Bristol Bay Borough Alaska 56.6 13.4
Denali Borough Alaska 60.7 14.1
Dillingham Census Area Alaska 60.7 14.1
Fairbanks North Star Borough Alaska 59.8 26.2
Haines Borough Alaska 74.5 13.2
Hoonah Angoon Census Area Alaska 64.0 8.6
Juneau City and Borough Alaska 64.0 32.2
Kenai Peninsula Borough Alaska 72.7 12.0
Ketchikan Gateway Borough Alaska 59.1 36.7
Kodiak Island Borough Alaska 59.2 25.9
Lake and Peninsula Borough Alaska 75.0 2.5
Matanuska-Susitna Borough Alaska 79.2 10.1
Nome Census Area Alaska 56.2 17.4
North Slope Borough Alaska 48.3 24.9
Northwest Arctic Borough Alaska 53.7 19.4
Petersburg Borough Alaska 76.7 9.5
Prince of Wales-Hyder Census Area Alaska 69.0 9.7
Sitka City and Borough Alaska 55.9 24.4
Skagway Alaska 59.1 27.2
Southeast Fairbanks Census Area Alaska 65.2 17.3
Valdez-Cordova Census Area Alaska 71.8 17.6
Kusilvak Census Area Alaska 64.8 4.1
Wrangell Alaska 78.7 11.9
Yakutat City and Borough Alaska 61.1 12.4
Yukon-Koyukuk Census Area Alaska 69.1 2.9
Apache County Arizona 76.3 5.2
Cochise County Arizona 69.0 12.2
Coconino County Arizona 61.2 18.9
Gila County Arizona 78.3 4.8
Graham County Arizona 72.0 7.7
Greenlee County Arizona 46.9 6.1
La Paz County Arizona 75.4 3.6
Maricopa County Arizona 66.3 25.1
Mohave County Arizona 71.5 9.8
Navajo County Arizona 72.5 7.0
Pima County Arizona 64.6 22.9
Pinal County Arizona 77.7 6.3
Santa Cruz County Arizona 71.0 17.6
Yavapai County Arizona 72.5 11.0
Yuma County Arizona 69.6 12.5
Arkansas County Arkansas 64.4 10.9
Ashley County Arkansas 72.4 8.5
Baxter County Arkansas 76.6 10.1
Benton County Arkansas 70.1 15.6
Boone County Arkansas 72.7 12.1
Bradley County Arkansas 70.2 8.0
Calhoun County Arkansas 82.1 2.4
Carroll County Arkansas 69.5 11.6
Chicot County Arkansas 69.6 10.0
Clark County Arkansas 67.6 15.9
Clay County Arkansas 74.0 8.1
Cleburne County Arkansas 77.9 5.7
Cleveland County Arkansas 78.0 3.1
Columbia County Arkansas 69.8 11.4
Conway County Arkansas 75.8 6.3
Craighead County Arkansas 61.2 20.0
Crawford County Arkansas 73.0 10.8
Crittenden County Arkansas 58.2 23.1
Cross County Arkansas 70.7 11.3
Dallas County Arkansas 70.5 7.9
Desha County Arkansas 59.1 17.2
Drew County Arkansas 67.5 7.7
Faulkner County Arkansas 66.4 17.3
Franklin County Arkansas 78.8 4.7
Fulton County Arkansas 79.6 3.4
Garland County Arkansas 70.1 16.7
Grant County Arkansas 80.3 2.4
Greene County Arkansas 65.7 13.5
Hempstead County Arkansas 68.4 10.2
Hot Spring County Arkansas 76.0 4.4
Howard County Arkansas 69.6 8.2
Independence County Arkansas 72.8 7.3
Izard County Arkansas 79.7 4.5
Jackson County Arkansas 69.8 15.0
Jefferson County Arkansas 64.4 14.8
Johnson County Arkansas 68.8 11.3
Lafayette County Arkansas 79.2 5.4
Lawrence County Arkansas 67.2 8.1
Lee County Arkansas 66.3 13.9
Lincoln County Arkansas 69.3 8.4
Little River County Arkansas 71.2 9.4
Logan County Arkansas 79.1 4.0
Lonoke County Arkansas 74.4 8.6
Madison County Arkansas 75.3 3.6
Marion County Arkansas 81.6 4.7
Miller County Arkansas 66.3 18.7
Mississippi County Arkansas 59.9 16.8
Monroe County Arkansas 61.4 16.1
Montgomery County Arkansas 82.8 2.5
Nevada County Arkansas 71.3 5.9
Newton County Arkansas 79.7 4.0
Ouachita County Arkansas 69.8 10.5
Perry County Arkansas 81.7 2.3
Phillips County Arkansas 54.7 15.7
Pike County Arkansas 74.8 6.1
Poinsett County Arkansas 66.4 11.5
Polk County Arkansas 77.4 3.7
Pope County Arkansas 69.7 14.1
Prairie County Arkansas 72.6 5.8
Pulaski County Arkansas 60.4 24.9
Randolph County Arkansas 76.6 6.8
St. Francis County Arkansas 58.7 15.2
Saline County Arkansas 77.7 7.8
Scott County Arkansas 75.1 6.4
Searcy County Arkansas 75.0 4.7
Sebastian County Arkansas 63.4 22.3
Sevier County Arkansas 74.2 6.1
Sharp County Arkansas 80.8 3.6
Stone County Arkansas 80.4 1.4
Union County Arkansas 71.2 8.8
Van Buren County Arkansas 78.5 9.9
Washington County Arkansas 56.3 30.1
White County Arkansas 69.0 13.8
Woodruff County Arkansas 61.5 12.9
Yell County Arkansas 70.1 7.3
Alameda County California 55.1 38.0
Alpine County California 73.4 38.4
Amador County California 77.3 7.5
Butte County California 61.2 19.7
Calaveras County California 78.8 3.7
Colusa County California 64.4 14.3
Contra Costa County California 69.5 23.8
Del Norte County California 60.9 14.6
El Dorado County California 76.5 12.1
Fresno County California 55.0 25.8
Glenn County California 67.5 11.4
Humboldt County California 57.6 18.9
Imperial County California 56.6 21.4
Inyo County California 64.0 13.1
Kern County California 61.4 18.1
Kings County California 56.0 18.5
Lake County California 67.1 7.8
Lassen County California 63.7 9.7
Los Angeles County California 48.2 41.8
Madera County California 63.0 12.0
Marin County California 64.0 27.0
Mariposa County California 70.0 8.0
Mendocino County California 62.8 13.1
Merced County California 55.9 17.3
Modoc County California 70.2 5.3
Mono County California 56.4 51.1
Monterey County California 53.4 26.6
Napa County California 65.1 19.7
Nevada County California 74.0 9.5
Orange County California 60.8 33.7
Placer County California 72.9 17.1
Plumas County California 65.6 6.6
Riverside County California 70.0 16.1
Sacramento County California 59.5 26.9
San Benito County California 64.3 14.3
San Bernardino County California 65.1 18.8
San Diego County California 55.9 35.5
San Francisco County California 37.5 66.6
San Joaquin County California 61.7 18.7
San Luis Obispo County California 61.4 17.7
San Mateo County California 61.1 32.4
Santa Barbara County California 54.1 29.5
Santa Clara County California 59.2 32.8
Santa Cruz County California 59.6 21.4
Shasta County California 66.0 16.0
Sierra County California 80.1 4.5
Siskiyou County California 65.2 14.4
Solano County California 65.8 21.4
Sonoma County California 62.4 18.8
Stanislaus County California 62.1 16.5
Sutter County California 61.8 20.0
Tehama County California 65.1 11.6
Trinity County California 73.5 7.1
Tulare County California 59.3 14.4
Tuolumne County California 70.2 8.6
Ventura County California 66.4 20.2
Yolo County California 54.1 30.6
Yuba County California 59.8 17.8
Adams County Colorado 68.4 23.9
Alamosa County Colorado 63.2 20.3
Arapahoe County Colorado 65.9 33.7
Archuleta County Colorado 82.9 17.0
Baca County Colorado 74.9 4.4
Bent County Colorado 67.4 8.9
Boulder County Colorado 63.9 28.2
Broomfield County Colorado 74.4 21.7
Chaffee County Colorado 76.9 7.4
Cheyenne County Colorado 80.5 5.6
Clear Creek County Colorado 81.3 11.8
Conejos County Colorado 75.7 4.1
Costilla County Colorado 74.6 7.3
Crowley County Colorado 74.0 6.5
Custer County Colorado 80.5 4.7
Delta County Colorado 74.3 5.3
Denver County Colorado 52.5 44.9
Dolores County Colorado 78.3 1.7
Douglas County Colorado 82.5 14.9
Eagle County Colorado 65.3 36.3
Elbert County Colorado 91.3 1.4
El Paso County Colorado 66.6 22.4
Fremont County Colorado 76.6 10.0
Garfield County Colorado 67.5 20.5
Gilpin County Colorado 71.8 11.3
Grand County Colorado 76.9 30.3
Gunnison County Colorado 59.1 28.8
Hinsdale County Colorado 83.5 3.6
Huerfano County Colorado 72.1 8.6
Jackson County Colorado 72.3 5.6
Jefferson County Colorado 71.9 24.7
Kiowa County Colorado 66.6 3.4
Kit Carson County Colorado 69.3 10.7
Lake County Colorado 66.9 17.7
La Plata County Colorado 69.1 18.7
Larimer County Colorado 67.5 21.4
Las Animas County Colorado 69.5 12.0
Lincoln County Colorado 70.8 11.8
Logan County Colorado 68.3 15.0
Mesa County Colorado 72.3 14.8
Mineral County Colorado 86.4 0.2
Moffat County Colorado 75.1 14.0
Montezuma County Colorado 72.8 9.5
Montrose County Colorado 74.6 8.9
Morgan County Colorado 67.3 12.1
Otero County Colorado 66.1 13.3
Ouray County Colorado 74.3 8.1
Park County Colorado 87.9 1.3
Phillips County Colorado 73.3 7.6
Pitkin County Colorado 62.5 40.0
Prowers County Colorado 66.7 14.9
Pueblo County Colorado 69.9 16.2
Rio Blanco County Colorado 74.1 13.8
Rio Grande County Colorado 78.9 8.9
Routt County Colorado 74.1 28.8
Saguache County Colorado 68.3 8.1

Associated Variables

  • The multi-unit and homeownership rates are said to be associated because the plot shows a discernible pattern.
    • The downward trend means the variables are negatively associated.
  • When two variables show some connection with one another, they are called associated variables.
  • If two variables are not associated, then they are said to be independent. That is, two variables are independent if there is no evident relationship between the two.

1.2.4 Explanatory and response variables

Suppose that \(X\) and \(Y\) are associated varaibles.

  • If variable \(X\) helps us explain or predict the value of variable \(Y\), we say that \(X\) is the explanatory variable and \(Y\) is the response variable.
  • Sometimes (not always) the explanatory variable affects the response variable, i.e., the change in one variable causes a change in the other.
    • Example: Does the median household income in a county cause its population size to change?

1.2.5 Observational studies and experiments

There are two primary types of data collection: experiments and observational studies.

  • In an experiment, researchers put subjects in two or more groups and compare them.
    • In a randomized experiment, researchers randomly assign the groups. (e.g., stent study)
  • In an observational study, researchers collect data in a way that does not directly interfere with how the data arise. (e.g, county data set).
  • Beware: Association \(\neq\) Causation.
    • Observational studies cannot determine causation (e.g., TV’s predict life expectancy)
    • Well-designed randomized experiments can prove causation.

Group Discussion

Air pollution

Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM\(_{10}\)) in \(\mu g/m^3\). Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM\(_{10}\) and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.

  1. Identify the main research question of the study.
  2. Who are the subjects (observational units) in this study, and how many are included?
  3. What are the variables in the study? Identify each variable as numerical or categorical.
  4. Which is the explanatory variable, and which is the response?
  5. Was this study an experiment or an observational study?

Researchers collected data to examine the relationship between air pollutants and preterm births in Southern California. During the study air pollution levels were measured by air quality monitoring stations. Specifically, levels of carbon monoxide were recorded in parts per million, nitrogen dioxide and ozone in parts per hundred million, and coarse particulate matter (PM\(_{10}\)) in \(\mu g/m^3\). Length of gestation data were collected on 143,196 births between the years 1989 and 1993, and air pollution exposure during gestation was calculated for each birth. The analysis suggested that increased ambient PM\(_{10}\) and, to a lesser degree, CO concentrations may be associated with the occurrence of preterm births.

Migraines and acupuncture

A migraine is a particularly painful type of headache, which patients sometimes wish to treat with acupuncture. To determine whether acupuncture relieves migraine pain, researchers conducted a randomized controlled study where 89 individuals who identified as female diagnosed with migraine headaches were randomly assigned to one of two groups: treatment or control. Forty-three (43) patients in the treatment group received acupuncture that is specifically designed to treat migraines. Forty-six (46) patients in the control group received placebo acupuncture (needle insertion at non-acupoint locations). Twenty-four (24) hours after patients received acupuncture, they were asked if they were pain free. Results are summarized in the contingency table below. Also provided is a figure from the original paper displaying the appropriate area (M) versus the inappropriate area (S) used in the treatment of migraine attacks.

Pain free?
Group No Yes
Control 44 2
Treatment 33 10
  1. What percent of patients in the treatment group were pain free 24 hours after receiving acupuncture?

  2. What percent were pain free in the control group?

  3. In which group did a higher percent of patients become pain free 24 hours after receiving acupuncture?

  4. What are the explanatory and response variables in this study? Classify each as numerical or categorical.

  1. Your findings so far might suggest that acupuncture is an effective treatment for migraines for all people who suffer from migraines. However, this is not the only possible conclusion. What is one other possible explanation for the observed difference between the percentages of patients that are pain free 24 hours after receiving acupuncture in the two groups?